TL;DR
Random Forest classifier on 10,000 customers with 80:20 class imbalance (non-churn:churn). Compared SMOTE vs class-weight adjustment — SMOTE improved precision by 12 pp, class-weight improved recall by 18 pp. Best overall model achieved F1 = 0.76 on churn class. Feature importance: age, balance, credit score are top three predictors.
10,000 Customers
80:20 Class Imbalance
F1 = 0.76 (Churn Class)
Random Forest
Python
Machine Learning
Random Forest
SMOTE
Classification
Class Imbalance
Project Overview
Customer churn in banking is a direct revenue loss: acquiring a new customer costs 5–7× more than retaining an existing one. This project builds a churn prediction system on 10,000 bank customers to identify high-risk customers before they leave — enabling the bank to deploy targeted retention interventions.
The dataset has an 80:20 class imbalance (8,000 non-churners : 2,000 churners). Without addressing this, a naive classifier achieves 80% accuracy simply by predicting "no churn" for everyone — useless for the actual business problem. The core engineering challenge was choosing the right imbalance-handling strategy depending on business priority: catching more churners (recall) vs avoiding false alarms (precision).
Model Comparison: SMOTE vs Class-Weight Adjustment
| Approach | Precision (Churn) | Recall (Churn) | F1 (Churn) | Best For |
| No imbalance handling |
0.61 | 0.48 | 0.54 |
— |
| SMOTE oversampling |
0.73 | 0.66 | 0.69 |
Minimising false positives (targeted retention spend) |
| Class weight adjustment |
0.61 | 0.84 | 0.71 |
Maximising recall (catching the most churners) |
| SMOTE + Tuned threshold |
0.68 | 0.78 | 0.76 |
Best overall balance — recommended |
The best-performing configuration combined SMOTE oversampling with threshold tuning (decision threshold lowered from 0.5 to 0.38), achieving F1 = 0.76 on the churn class — a 41% improvement over the baseline.
Key Insights
- Class imbalance is a business problem, not just a technical one — the right approach depends on whether the bank prioritises precision (targeted spend) or recall (catching all churners). There is no universally "best" answer.
- Age (35–50), account balance, and credit score below 600 are the top three churn predictors — actionable signals for proactive outreach.
- Germany-based customers churn at 2× the rate of French and Spanish customers — suggesting regional service or product gaps worth investigating.
- Customers with 1 product churn at significantly higher rates than those with 2+ products — cross-selling is a directly addressable retention lever.
- A customer flagged as high churn risk with an estimated lifetime value above £5,000 justifies a personalised retention call; below that threshold, an automated email campaign is more cost-effective.
Technical Implementation
Preprocessing & Feature Engineering:
- Handled missing values and outliers (balance had right-skewed distribution — log transform applied).
- Encoded categorical variables: Geography (one-hot), Gender (binary).
- StandardScaler applied to numerical features for consistency with downstream SMOTE.
- Created interaction feature:
balance_to_salary_ratio — stronger predictor than either alone.
Modelling Approaches:
- Random Forest + SMOTE: SMOTE applied only to training set (not test — critical to avoid data leakage).
- Random Forest + Class Weight:
class_weight='balanced' in sklearn, which internally applies inverse-frequency weighting.
- 5-fold stratified cross-validation on all configurations to ensure consistent evaluation across imbalanced folds.
- Hyperparameter tuning via RandomisedSearchCV: n_estimators, max_depth, min_samples_split.
Key Learnings
- SMOTE must only be applied to training data — applying it before train/test split inflates evaluation metrics because synthetic samples near test points artificially improve test performance. This is a common mistake and a red flag in DS interviews.
- Threshold tuning is underutilised — most projects stop at 0.5 threshold. Moving the threshold to 0.38 improved F1 by 7 points without changing the model at all, purely by redefining what constitutes a "positive" prediction.
- Feature importance from Random Forest is a starting point, not a conclusion — correlated features can dilute each other's importance scores. SHAP values would give a more accurate and model-agnostic attribution.
Future Work
- Add SHAP values for model interpretability — Random Forest feature importance doesn't account for feature correlation, which skews attribution for correlated predictors.
- Build a simple customer LTV model to weight churn predictions by business impact — a high-probability churn on a low-value customer is less urgent than a moderate-probability churn on a high-value one.
- Evaluate gradient boosting (XGBoost/LightGBM) — they typically outperform Random Forest on structured tabular data with class imbalance.
Built by Om Patel — ML Engineer & Data Scientist.
Explore more projects on my
Portfolio.